-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-19696. hadoop binary distribution to move cloud connectors to hadoop common/lib #7980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
HADOOP-19696. hadoop binary distribution to move cloud connectors to hadoop common/lib #7980
Conversation
|
|
by contrast: 3.4.2 |
|
Having audited the files coming off the cloud connectors, we have about a dozen whose licenses aren't in the binary the analyticsaccelerator is @ahmarsuhail 's work to add to the license, not sure about the others. Proposed: identify which connector the unacknowledged artifacts are coming from, create homework for each team. |
|
💔 -1 overall
This message was automatically generated. |
|
not very familiar with how the packaging stuff works, so finding this a bit difficult to review. How are you testing the packaging? I just ran
are all the same before and after your changes, so I must be doing something wrong. |
|
did you do a
the big distro created under |
c59e351 to
0aaa6ce
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Latest build generates stack traces from gcs and obs filesystem incomplete CP in service loader. Both need to move to core-default.xml only which is faster anyway. |
|
💔 -1 overall
This message was automatically generated. |
|
shaded test failures
|
|
💔 -1 overall
This message was automatically generated. |
378f021 to
4c10b3e
Compare
BUILDING.txt
Outdated
| ---------------------------------------------------------------------------------- | ||
| Including Cloud Connector Dependencies in Distributions: | ||
|
|
||
| Hadoop distributions include the hadoop modules need to work with data and services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: "modules needed".
hadoop-cloud-storage-project/pom.xml
Outdated
| <module>hadoop-cos</module> | ||
| <module>hadoop-huaweicloud</module> | ||
| <module>hadoop-tos</module> | ||
| <module>hadoop-cloud-storage-dist</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we would never need to enter execution of hadoop-cloud-storage-dist unless we are building a distro (activating -Pdist). Should we also wrap inclusion of the sub-module here behind activation of the dist profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid point. Will do, as it'll save on disk space as well as time.
|
💔 -1 overall
This message was automatically generated. |
…hadoop common/lib * new assembly for hadoop cloud storage * hadoop-cloud-storage does the assembly on -Pdist * layout stitching to move into share/hadoop/common/lib * remove connectors from hadoop-tools-dist * cut old jackson version from huawaei cloud dependency -even though it was being upgraded by our own artifacts, it was a complication.
* add the artifacts found with the relevant hadoop-* modules to the binary license * leave all three with cloud-storage dependencies such that they don't include these in a pull of hadoop-cloud-storage (regression?) * unless specific profiles cos, huawei and aliyun are declared, at which point they're exported by hadoop-cloud-storage and put into the assembly. This avoids dealing with complex dependencies we don't want (okio, more xml parsers,...), let making it straightforward to build a distro with it if you want. bundle.jar is always getting in. Do I do it here iff -Paws is set or do I delay it until the copy to the final distro artifact tree takes place. delay: keeps it as an export of hadoop-cloud-storage pom early: consistent with the rest
* Unshade tos * explicit declaration of apache http dependencies, with excludes as needed * updated LICENSE-binary
This ensures that anything done dependency-wise for packaging doesn't impact the hadoop-cloud-storage module and any downstream uses.
This ensures that anything done dependency-wise for packaging doesn't impact the hadoop-cloud-storage module and any downstream uses. - separate profile for each component to pull in all dependencies - hadoop-azure is always included, hadoop-aws *except* bundle.jar - hadoop-gcp and hadoop-tos are complete iff shaded - updated BUILDING.txt This is enough to let anyone cut a release with their choice of functional cloud connectors.
The hadoop-cloud-storage-dist module is now only executed when the dist profile is set.
0cb1061 to
e39cd33
Compare
| Excluding the extra binaries: | ||
| * Keeps release artifact size below the limit of the ASF distribution network. | ||
| * Reduces download and size overhead in docker usage. | ||
| * Reduces the CVE attack surface and audit-related complaints about those same ScVES. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: "CVEs."
How was this patch tested?
Manual build, review, storediag, hadoop fs commands
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?